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^-H Abstract. We present several steps towards large formal mathemati- 

f^ cal wikis. The Coq proof assistant together with the CoRN repository 

04 are added to the pool of systems handled by the general wiki system 

1—1 described in [10]. A smart re- verification scheme for the large formal li- 

3 braries in the wiki is suggested for Mizar/MML and Coq/CoRN, based 

' on recently developed precise tracking of mathematical dependencies. We 

^>0 propose to use features of state-of-the-art filesystems to allow real-time 

'"^ cloning and sandboxing of the entire libraries, allowing also to extend the 

^__^ wiki to a true multi-user collaborative area. A number of related issues 

1 are discussed. 

q 

c/3 1 Overview 

O 



> 



This paper proposes several steps towards large formal mathematical wikis. In 
Section 3 we describe how the Coq proof assistant together with the CoRN 
repository are added to the pool of systems fully handled by the wiki architecture 
(-^ proposed in [10], i.e., allowing both web-based and version-control-based updates 

04 of the CoRN wiki, using smart (parallelized) verification over the whole CoRN 

C^ library as a consistency guard. Because the task of large-scale library refactoring 

[^ — is still resource-intensive, an even smarter re-verification scheme for the large 

^^ formal libraries is suggested for Mizar/MML and Coq/CoRN, based on precise 

^~~^ tracking of mathematical dependencies that we started to develop recently for 

. . the Coq and Mizar proof assistants, see Section 4. We argue for the need of 

^ an architecture allowing easy sandboxing and thus easy cloning of the whole 
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large libraries. This poses technical challenges in the real-time wiki setting, as 
cloning and re- verification of large formal libraries can be both a time and space 
consuming operation. An experimental solution based on the use of modern 
filesystems (Btrfs or ZFS in our case) is suggested in our setting in Section 5. 
Solving the problem of having many similar sandboxes and clones despite their 
large sizes allows us to use the wiki as a hosting platform for many collaborating 
users. We propose to use the gitolite system for this purpose, and explain the 
overall architecture in Section 6. As a corollary to the architecture based on 
powerful version control systems, we get distributed wiki synchronization almost 
for free. In section 7 we conduct an experiment synchronizing our wikis on servers 
in Nijmegen and in Edmonton. Finally we discuss a number of issues related to 
the project, and draw recommendations for existing proof assistants in Section 8. 



2 Introduction: Developing Formal Math Wikis 

This paper describes a third iteration in the MathWiki development.'* An agile 
software development cycle typically includes several (many) loops of require- 
ments analysis, prototyping, coding, and testing. A wiki for formal mathematics 
is an example of a strong need for the agile approach: It is a new kind of software 
taking ideas from wikis, source-code hosting systems, version control systems, 
interactive verification tools and specialized editors, and strong semantic-based 
code/proof assistants. Building of formal wikis seem to significantly interact with 
the development of proof assistants, and their mutual feedback influences the de- 
velopment of both. For example, a number of changes has already been done in 
the last year to the Mizar XML and HTML-ization code, and to the MML ver- 
ification scripts, to accommodate the appearing wiki functionalities. See below 
for changes and recommendations to the related Coq mechanisms, and other 
possibly wiki-handled proof assistants. Also, see below in Section 4 for the new 
wiki functions that are allowed when precise dependency information about the 
formal libraries becomes available for a proof assistant. 

The previous two iterations of our wiki development were necessarily ex- 
ploratory; our work then focused on implementing the reasonably recognized 
cornerstone features of wikis. We used version control mechanisms suitable both 
for occasional users (using web interfaces) and for power users (working typically 
locally) , and allowing also easy migration to future more advanced models based 
on the version-controlled repositories. We supplied HTML presentations of our 
content, enriched in various ways to make it suitable for formal mathematics 
(e.g., linking and otherwise improved presentation of definitions and theorems, 
explicit explanation of current goals of the verifier, etc.) One novel problem in 
the formal mathematical context was the need to enforce validity checks on the 
submitted content; for this, we developed a model of fast (parallelized) auto- 

* The first was an experimental embedding of the CoRN and MML repositories inside 
the ikiwiki (http://ikiwiki.info/) system, and the second iteration is described 
in our previous paper [10]. 



mated large-scale verification, done consistently for the largest formal library 
available. 

The previous implementations already provide valuable services to the proof 
assistant users, but we focused initially only on the Mizar proof assistant. While 
library-scale refactoring and proof checking is a very powerful feature of the for- 
mal wikis (differentiating them for example from code repositories), it is still 
too slow for large libraries to allow its unlimited use in anonymous setting. We 
have observed that users are often too shy to edit the main ofhcial wiki, as 
their actions will be visible to the whole world and influencing the rest of the 
users. A more structured/hierarchical/private way of developing, together with 
mechanisms for collaboration and propagation of changes from private experi- 
ments to main public branches are needed. Our limited implementation provided 
real-world feedback for the next steps described in this paper: 

— We add Coq with CoRN to the pool of managed systems. 

— We describe a smarter and faster verification modes for the wikis, that we 
started to implement within proof assistants exactly because of the feedback 
from previous wiki instances. 

— We add a more fine-grained way to edit formal mathematical texts, making 
it easier to detect limited changes (and thus avoid expensive re- verification). 

— We manage and control users and their rights, allowing the wiki to be exposed 
to the world in a structured way not limited to a trusted community of users. 

— A mechanism in which the users get their own private space is proposed and 
tested, which turns out to be reasonably cheap thanks to usage of advanced 
filesystems and its crosslinking with the version control model. 

— A high-level development model is suggested for the formal wiki, designed 
after a recently proposed model [6] for version-controlled software develop- 
ment. We extend that model by applying different correctness policies, which 
helps to resolve the tradeoffs between correctness, incrementality, and unified 
presentation discussed in [10]. 

One aim of our work is to try to improve the visibility and usability of formal 
mathematics. The field is sorely lacking an attractive, simple, discoverable way 
of working with its tools. The formal mathematics wiki we describe here is one 
project designed to tackle this problem. 

3 The Generalized Formal Wiki Architecture, and its 
Coq and CoRN Instance 

One of the goals of initially developing a wiki for one system (Mizar) was to find 
out how much work is needed for a particular proof assistant so that a first-cut 
formal wiki could be produced. An advantage of that approach was that as Mizar 
developers we were capable to quickly develop the missing tools, and adjust the 
existing ones. Another advantage of focusing on Mizar initially was that the 
Mizar Mathematical Library (MML) is one of the largest formal mathematical 
libraries available, thus forcing us to deal early on with efficiency issues that go 



far beyond toy-system prototypes, and are seen in other formal libraries to a 
lesser extent. 

The feasibility of the Mizar/MML wiki prototype suggested that our general 
architecture should be reasonably adaptable to any formal proof assistant pos- 
sessing certain basic properties. The three important features of Mizar making 
the prototype feasible seem to be: batch-mode (preferably easily parallelizable) 
verification; fast dependency extraction (allowing some measure of intelligence 
in library re-compilation based on the changed dependencies); and availability of 
tools for generating HTML representations of formal texts. With suitable adap- 
tation, then, any proof assistant with these properties can, in principle, be added 
to our pool of supported systems. 

It turns out that the Coq system, and specifically the Coq Repository at 
Nijmegen (CoRN) formal library, satisfies these conditions quite well, allowing 
to largely re-use the architecture built for Mizar in a Coq/CoRN wiki'^. 

3.1 HTML presentation of Coq content with coqdoc 

We found that the coqdoc tool, part of the standard Coq distribution, provides 
a reasonable option for enriched HTML presentation of Coq articles. With some 
additional work, it can be readily used for the wiki functionalities. Note that 
an additional layer (called Proviola) on top of coqdoc is being developed [8], 
with the goal of eventually providing better presentation and other features for 
interacting with Coq formalization in the web setting. As in the case of Mizar 
(and perhaps even more with nondeclarative proofs such as those of Coq), much 
implicit information becomes available only during proof processing, and such 
information is quite useful for the readers: For example, G. Gonthier, a Coq 
formalizer heading the Math Components project,^ asserts that his advanced 
proofs are human-readable, however only in the special environment provided 
by the chosen Coq user interface. This obviously can be improved, both by 
providing better (declarative) proof styles for Coq (in the spirit of [5]), and by 
exporting the wealth of implicit proof information in an easily consumable form, 
e.g., similarly as Mizar does [9]. 

Unlike the Mizar HTML-ization tools (with possible exception of the MML 
Query tool [4]), the coqdoc tool provides some additional functionalities like au- 
tomated creation of indexes and tables of contents, see for example Figure 1 for 
the CoRN wiki contents page. This can be used for additional useful presenta- 
tion of the Coq wiki files, and is again a motivation (for Mizar and other proof 
assistants) to supply such tools for their wikis. 

3.2 Batch- mode processing and dependency analysis with Coq 

Coq allows both interactive and batch- mode verification (using the coqc tool), 
and also provides a special tool (coqdep) for discovering dependencies between 
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Coq files, suitable for Makefile-based compilation and its parallelization. A dif- 
ference of CoRN to MML is that the article structure is not flat in CoRN (in 
Mizar, all articles are just kept in one "mnil" directory), and arbitrarily deep 
directory structure has to be allowed. This poses certain challenges when adding 
new files to CoRN, and taking care of their proper compilation and HTML pre- 
sentation. The current solution is that the formal articles are really allowed to 
live in nested subdirectories, while the corresponding HTML live in just one (flat) 
directory (this is how the coqdoc documentation is traditionally produced), and 
the correspondence between the HTML and the original article (necessary for 
editing operations) is recovered by relying on the coqdoc names of the HTML 
files basically containing the directory (module) structure in them. This is a 
good example of a real-world library feature that complicates the life of formal 
wiki developers: It would be much easier to design a flat-structured wiki on the 
paper, however, if we want to cater for real users and existing libraries, imperfect 
solutions corresponding to the real world have to be used. 

Interestingly, the structure of the dependencies in the CoRN repository dif- 
fers significantly from the MML. MML can really benefit a lot from large-scale 
parallelization of the verification and HTML-ization, probably because it con- 
tains many different mathematical developments that are related only indirectly 
(e.g., by being based in set theory, using some basic facts about set-theoretic 
functions and relations, etc.). This is far from true for the CoRN library. Paral- 
lelization of the CoRN verification helps comparatively little, quite likely because 
the CoRN development is very focused. Thus, even though the CoRN library is 
significantly smaller than the MML (about a quarter of the size of the MML), 
the library re-verification times are not significantly different when verification 



is parallelized. This is a motivation for the work on finer dependencies described 
in Section 4. 



3.3 New CoRN development with SSReflect 

A significant issue for wiki development turns out to be the new experimental 
version of CoRN, developed at Nijmegen based on the Math Components SS- 
Reflect library. This again demonstrates some of the real-world choices that we 
face as wiki developers. The first issue is binary incompatibility. The SSReflect 
(Math Components) project has introduced its own special version of the coqc 
binary, and standard coqc is no longer usable with it. Obviously, providing a 
common wiki for the Coq Standard Library and the Math Components project 
(even though both are officially Coq-based) is thus (strictly speaking) a fiction. 
One possible solution is that the compiled (.vo) files might still be compatible, 
thus allowing us to provide some clever reconipilation mechanisms for the com- 
bined libraries. The situation is even worse with the developing version of CoRN, 
which relies (due to its advanced exploration of Coq type classes [7]) on both a 
special (fixed) version of the coqc binary, together with a special (fixed) version 
of the SSReflect library. This not only makes a joint wiki with the Coq Standard 
Library hard to implement, but it also prevents a joint wiki with the Math Com- 
ponents project (making changes to the SSReflect library, which has to be fixed 
for CoRN). To handle such real issues, the separate/private clones/branches of 
the wiki, used for developing certain features and for other experiments will have 
to be used. This is one of the motivations for our general proposal in Sections 5 
and Section 6. It is noteworthy that older versions of CoRN also relied on their 
own Coq binary, including custom ML code. However, the features implemented 
by custom ML code were partly provided by newer versions of Coq, and partly 
reimplemented in Coq's LTac language. So there is a pattern there of new de- 
velopments requiring custom Coq binaries which has to be taken into account 
when developing real-world wikis. 

4 Using Fine-grained Dependency Information for a 
Large Formal Wiki 

In order to deal with the efficiency issues mentioned in previous sections, we have 
started to develop tools allowing much finer dependency tracking, and thus much 
finer and leaner recompilation modes, than is currently possible with Mizar and 
Coq. This work is reported in [1]. To summarize, we add a special dependency- 
tracking code to Coq, which can now track most of the mutual dependencies 
of Coq items (theorems, definitions, etc.), and extract the direct and transi- 
tive graph of dependencies between these items. Similarly, but using a different 
technique, we extract such fine dependencies from the Mizar formalizations. For 
Mizar this is done by advanced refactoring of the Mizar articles into one-item 
micro- articles, and computing their minimal dependencies by a brute-force min- 
imization algorithm. The result of the algorithm again provides us for each item 



/ with the precise information about which other Mizar items the item / de- 
pends on. This information is again compiled into graphs of direct and indirect 
dependencies. The Mizar wiki already allows viewing of fine theorem and scheme 
dependencies aggregated for the articles, see Figure 2 for those of the CARD_LAR 
article. 
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Fig. 2. Aggregated fine theorem and scheme dependencies for article CARD_LAR 



4.1 Speeding up (re) verification 

It turns out that such fine dependencies have the potential to provide significant 
speedups for expensive library refactorings. The following Table 1 from [1] shows 
the dependency statistics and comparison for the CoRN and MML (first 100 
articles) libraries. For example, the number of direct dependency edges computed 
by the fine-grained method in MML drops to 3% in comparison with the number 
of direct dependencies assumed by the traditional coarse file-based dependencies. 
This is obviously a great opportunity for the formal wiki providing very fast 
(and also much more parallalizable) verification and presentation services to the 
authors of formal mathematics. 

4.2 Delimited editing 

The wiki now also exploits fine-grained dependency information, for the case of 
Mizar, by providing delimited text editing. The idea is to present the user with 
a way to edit parts of a formal mathematical text, rather than an entire article. 
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Deps Number of dependency edges 
TDeps Number of transitive dependency edges 

P Probability that given two randomly chosen items, one depends (directly or indi- 
rectly) on the other, or vice-versa. 
ARL Average number of items recompiled if one item is changed. 
MRL Median number of items recompiled if one item is changed. 

Table 1. Statistics of the item-based and file-based dependencies for CoRN and MML 



This is a formal analog of the "Edit this section" button in Wikipcdia. The task 
is to divide a text into its constituent pieces, and provide ways of editing only 
those pieces, leaving other parts intact. The practical advantage of such a feature 
is that we can be sure that edits to the text have been made only in a small part 
of the text that can have only a limited impact on other parts. When wc know 
that an edit is made only to, say, the proof of a single theorem, then wc do not 
need to check other theorem in the text; the text as a whole is correct just in 
case the new proof is correct. If the statement of a theorem itself is modified, it 
is sufficient to rc-chcck only those other parts of the article that explicitly use 
or otherwise directly depend on this theorem. See Figure 3 for an example of 
delimited editing of theorem CARD_LAR:2. 

5 Scaling Up 

In Section 6 we propose a wiki architecture that caters for many users and 
many related developments, using the gitolite tool, and authentication policies 
for repository clones and branches. As mentioned in Section 3, this seems to 
be a pressing real-world issue, necessary for the various collaborative aspects of 
formalization. Such a solution, however, forces us to deal with many versions 
of the repositories, which are typically very large. The Mizar HTML itself is 
several gigabytes in size, and in order to be able to quickly re-compile the formal 
developments, we also have to keep all intermediate compilation files around. In 
addition to that, our previous implementation needed the space for at least two 
versions of all these files, so that we could quickly provide a fresh sandbox (with 
all the intermediate files in it) for a recompilation of only the newly modified 
articles, and so that we were able to quickly return to a clean saved state if a 
re-compilation in the sandbox fails. Thus, the size of the Mizar wiki could reach 
almost 20 Gigabytes. 
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Fig. 3. Delimited editing of theorem CARD_LAR:2 



It is clear that with these sizes, it becomes impractical to provide a pri- 
vate clone or a feature clone for hundreds (or even dozens) of interested users. 
Fortunately, we can solve this by using the copy-on-write capabilities of mod- 
ern filesystems: these mechanisms enable us to create time- and space-efficient 
copies of branches in the wiki, storing only the changes with respect to the 
original branch. 

Currently, there are several copy-on-write filesystems under active develop- 
ment; a well-known example is the ZFS filesystem, which was first released by 
Sun Microsystems in 2005. Unfortunately, although ZFS is open-source, license 
incompatibilities prevent it from being distributed as part of the Linux kernel 
(which we use to host the MathWiki system). More recently, work has begun 
on a filesystem called Btrfs^, which aims to bring many of the features of ZFS 
to Linux. Included in the mainline kernel in 2009, it is not yet as stable as tra- 
ditional Linux filesystems, but its copy-on-write snapshotting is already usable 
for our purposes. The functionality provided by Btrfs can be combined with the 
architecture suggested in Section 6 to create a system that will scale to large 
numbers of users and branches, which is described below. 

The git repositories themselves are typically quite small, as they are com- 
pressed, contain only the source files (not the intermediate and HTML files), 
and additionally git allows reference sharing. Thus the main problem are the 
working copies that need to be present on the server for browsing and fast re- 
compilation. However, these copies will typically share a lot of content, because 
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the users typically modify only a small part of the large libraries, and typically 
start with the same main branch. 

Our solution is to implement the cloning of new user repositories using Btrfs 
snapshots. That is, we keep a working copy of the main repository in a separate 
Btrfs volume, and create a snapshot (a writeable clone) of this whenever a user 
clones the repository. Due to the copy-on-write nature of Btrfs, this operation 
is efhcient in terms of time and space: creating a snapshot takes 0.03 seconds 
(on desktop-class hardware), and 6 KB of disk space, even for cloning very 
large (lOG big) volumes as the one containing the Mizar wiki. Thus, we can now 
provide space for a very large number of clones and versions, and do it practically 
instantaneously. 

As the snapshot is modified, disk usage grows proportionally to the size of 
the changes. Changing a file's metadata (e.g., updating its last-modified-time, 
as required for our fast recompilation feature) costs 10 KB on average (this 
is a one-time cost, paid only when the user really makes the effort and does 
some acceptable changes). Modifying the content of a file increases disk usage 
by the amount of newly written data, plus a fixed overhead of about 12 KB. We 
have found that in order to maximize the amount of sharing between related 
snapshots, it is advisable to disable file-access-time updates on the filesystem.® 

Each time a repository fails to compile, and needs to be restored, we can 
roll back to a previous state by discarding the latest snapshot. This is also a 
fast operation, typically taking less than a second, and saving us the necessity 
to maintain another lOG-large sandbox for possibly destructive operations, and 
peridically using (slower) file-based synchronization (rsync) with the main wiki. 

The following Table 2 documents the scalability of Btrfs and its usability 
in our setting. It summarizes the following experiment: The main public wiki is 
populated with the whole Mizar library, which together with all the intermediate 
and HTML files takes about lOG of an (uncompressed) Btrfs subvolume. Then 
we emulate 10, 100, and 200 experimental wiki clones based on the main public 
wiki. Each of the clones starts as a snapshot of the main public wiki, to which a 
user decides to add his new development (Mizar article) depending on nontrivial 
part of the library (article CARD_1 [3] was used). The article is then verified 
and HTML-ized, trigerring also library-wise update of various fine-dependency 
indexes and HTML indexes. This process is done by running full-scale make 
process on the whole library, requiring reading of modification times of tens of 
thousands of files in the newly created clone. Despite that, the whole process is 
reasonably fast and real-time, and scales well even with hundreds clones. The 
whole operation takes 6.9 seconds per clone on average for 10 clones, and 7.2 
seconds on average when creating 200 clones in a series. The average growth in 
overall filesystem consumption (for the new article, its intermediate files, and 
updated indeces) is 5.22MB per clone when testing with 10 clones, and 5.26MB 
when testing with 200 clones. To summarize, the total cost of providing 200 
personalized lOG-big clones with a newly verified article in them is only about 
1GB of storage. 
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Table 2. Time and space data for 10, 100, and 200 clones with a new article verification 

clones time (s) disk usage (MB) 

data metadata total 



10 


6.9 


4.71 


0.51 


5.22 


100 


7.0 


4.71 


0.55 


5.26 


200 


7.2 


4.71 


0.55 


5.26 



6 Many Users, Many Branches 

The current system now presents one version of CoRN and the MML to the entire 
community. To help make the site more attractive and useful, we would like the 
wiki to be a place where one can store one's work-in-progress; one would store 
one's own formal mathematical texts and have a mechanism for interacting with 
other users and their work. One could then track one's own progress online, and 
possibly follow other people's work as well. It would be akin to a GitHub for 
formal mathematics. In this section we describe the Git-based infrastructure for 
implementing multiple users. 

The idea of extending a wiki such as ours from one anonymous user to a 
secure, multiuser one, maintaining security while preserving time and space ef- 
ficiency, presents a fair number of technical challenges. One basic question: how 
do we extend our Git-based model? Would we store one repository for everyone, 
with different branches for each user, or do we give each user his own repository? 
How would one deal with ensuring that different users don't interfere with the 
work of other users? How do we deal with multiple people trying to access a 
repository (or repositories)? Note that this also leads to the problem of storing 
many different (but only slightly different) copies of large formal corpora solved 
in the previous section by using advanced filesystem. 

For managing multiple users we opted for a solution based on the gitolite 
system.^ gitolite adds a layer to Git that provides for multiple users to access 
a pool of repositories, guarded by SSH keys. With gitolite one can even set up 
fine-grained control over particular branches of repositories. One can specify that 
certain repositories (or a particular branch) is unavailable to a user (or group 
of users), readable but not writable, or read- writable, gitolite makes transparent 
use of the SSH infrastructure; once a user has provided RSA public key to us (the 
registration page is shown in Figure 4), he is able to carry out these operations 
via the web page or through the traditional command-line interface to Git. 

In addition to supporting multiple users, we also want to permit multiple 
branches per user. The following Git branching policy described by V. Driessen [6] 
provides a handful of categories of branches: 

We consider origin/master to be the main branch where the source 
code of HEAD always reflects a production-ready state. We consider 
origin/develop to be the main branch where the source code of HEAD 
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Us«rnam« The uscmame you give wili be the name under which all your fonnai mathcmadcai work is stored. Your uscmame should be a! most 25 alphanumeric 
characters Ion J] . 

Public key Your SSH public key is used to ensure that only you aie permitted to push content to your branches. Submit your pi<ti(c SSH key , not your private key . 
Generally , if you arc using a UNK-typc operating system, your SSH public key will be stored as the file idraa . pub or iddsa .pjbinthe.ssh subdirectory of your 
home directory, depending on what cryptographic scheme was used to croatc the key pair. (Note the extension . pub on die file; diis indicates that you are dealjig with 
your public key. Your private key generally has the same flic name, but lacks the extension .pub.) 

NolEs: 

• Do not give us your SSH private key. That infotmation is useless for us for the puiposes of authcnticatlcg you; wc need your public key. Moreover, your private 
key should generally not be shared with anyone. (To learn more about how the SSH key scheme works, see the OpenSSH homepagc .l 

• If you do not have a public key, ortherc is no directory called . ssji in your home directory, or if you've misplaced your key, then you need to generate one. To do 
diis, simply invoke the command 



Fig. 4. Registration page at our wiki 



always reflects a state with the latest delivere(i development changes for 
the next release. Some would call this the "integration branch" . This is 
where any automatic nightly builds are built from. 

In addition to main and developer branches, we intend to support other kinds 
of branches: feature (for work on a particular new feature), release (for official 
releases of the formal mathematical texts), and hotfix (fixes for critical bugs). 
A gitolite access implementing a Driessen-style model can be seen in Figure 5. 

The intention of this policy is to divide users into certain classes and permit 
certain kinds of operations (creating a branch, reading it, reading-and- writing 
to it). The user classes have the following meaning: 

— admin: can do anything, has root access to the server 

— superuser: can do arbitrary operations on the wikis taking arbitrary times, 
can update binaries, etc 

— maintainer: can update the main stable wiki, start/close the release and 
hotfix branches 

— developer: can update the develop clone, start/close feature branches, 

— user: limited to his userspace, and inexpensive operations 

— coionymous: limited to the anonymous user space 

The name of the repository is now also an argument to a Git pre-commit or 
pre-receive hook, which applies a particular verification policy to the repository. 
For the main and develop repositories the policy should require full verifiabil- 
ity, while other branches should not have to, so that these function more like 
work-in-progress notebooks. (Such branches present an interesting problem of 



@all = Osuperusers @maintainers ^developers Ousers Oanonymous 

repo main 

RW+ = Osuperusers @maintainers 

R = ©developers @users Sanonymous 

repo devel 

RW+ = Osuperusers @maintainers (§developers 

R = ©users ©anonymous 

repo feature/ [a-zA-ZO-9] . * 

C = Osuperusers Omaintainers ©developers 

RW+ = Osuperusers @maintainers (§developers 

R = ©users ©anonymous 

repo (release I hotfix)/[a-zA-Z0-9] .* 

C = Osuperusers @maintainers 

RW+ = Osuperusers @maintainers 

R = ©developers (§users ^anonymous 

repo user/CREATOR/ [a-zA-ZO-9] . * 

C = @superusers Omaintainers Odevelopers ©users 

RW+ = CREATOR 

R = Sail 



Fig. 5. A gitolite policy for different kinds of wiki users 



displaying, in a helpful way, possibly incorrect formal mathematical texts), gito- 
lite also provides a locking mechanism for addressing the problem of concurrent 
reads and vif rites. 

With the registration form, the wiki users can now submit their RSA public 
keys to the wiki system. Doing so adds them to gitolite's user space, so that they 
can create new (frontend) Git repositories (e.g., by cloning some already existing 
repository) . Doing so triggers the creation of a corresponding backend repository 
(gitolite manages directly the frontends, while the backend is managed indirectly 
by us via Git hooks and CGI). The backend repositories contain the full wiki 
populated with the necessary intermediate files needed for fast re-comopilation, 
and obviously also with the final HTML representation of the contents, exactly 
as we did in the previous one-user, one-repository version of MathWiki. The 
backends themselves live in a filesystem setup described in Section 5 that re- 
uses space using filesystem techniques as copy-on-write. The result is quite a 
scalable platform, allowing many users, many (related) developments, different 
verification and authorization policies via gitolite and git hooks, and attempting 
to provide as fast verification and HTML-ization services as possible for a given 
proof assistant and library. 



Note however that tasks such as re-verifying a whole large library from 
scratch will always be expensive and this should be reflected to the users. Apart 
from the many efficiency solutions mentioned so far, we are also experimenting 
with the problem of queuing pending wiki operations. We should allow them to 
have various superficial fast modes of verification. ^° Users could have their own 
queues of jobs, and would be allowed to cancel them, if they see that some other 
task would invalidate the need to do the other ones. However, when committing 
to the devel or main branches, as mentioned, full verification should always be 
required. 



7 Multiple Wiki Servers and their Synchronization 

Mirroring is a common internet synchronization procedure used for a number of 
reasons. Mirroring increases availability by decreasing network latency in mul- 
tiple geographical locations. Mirroring also helps to balance network loads and 
supports backup of content. An internet mirror is live when it is changed imme- 
diately after its origin changes. With custom wiki software, such as Media Wiki^^ 
(the wiki engine behind Wikipedia), there can typically be just one central repos- 
itory to which updates are made. This is no longer such a limitation with a wiki 
such as our, which is built on top of a distributed version control system. 

In case of the Mizar part of our wiki, the practical motivation for mirroring al- 
ready exists: There arc currently three reasonably powerful servers (in Nijmegen, 
Edmonton, and Bialystok) where the wiki can be installed and provide all its ser- 
vices. Given that re-verification of the whole formal (e.g., Mizar) library is still a 
costly operation, distributing the work between these servers can be quite useful. 
An obvious concern is then however the desynchronization of the developments. 

This turns out to be easy to solve using the synchronization mechanism of 
a distributed version control system like Git. Git already comes with its own 
options for mirroring the changes in other repositories, which can be easily trig- 
gered using some of its hooks (in Git terminology, we are using the post- update 
hook on bare repositories). Because our wiki is "just" a Git repository (with 
all other functionalities implemented as appropriate hooks) that allows pushing 
into it as any other Git repository, it turns out that this mirroring functional- 
ity is immediately usable for live synchronization of our wikis. The process (for 
example, for two wikis) works as follows: 

— The wikis are initialized over the same Git repository. 

— A post-update hook is added to the frontend (bare) Git repository of each 
of the wikis, making a mirroring push (pushing of all new references) to the 
mirroring wiki's frontend repository. 



^° For Mizar, one could run only the exporter, or also the analyzer, or the full verifier. 
These "compiler-like" stages actually do not have to be repeated in Mizar once they 
were run. 
http: //www.mediawiki . org/wiki/MediaWiki 



— Upon a successful commit /push to any of the wiki servers, the pushed server 
thus automatically updates also the mirroring wiki, triggering its verification 
and HTML-ization functions, exactly in the same way as a normal push to 
the wiki triggers these wiki-updating functions. 

Note that this is easy with distributed version control systems such as Git, 
precisely because there is no concept of a central repository, so that all reposi- 
tories are equal to each other and implement the same functionality. It is easy 
also because from the very beginning, our wiki was designed to allow arbitrary 
remote pushes, not just standard wiki-like changes coming from web editing. 

This mechanism also allows us to have finer mirroring policies. For example, a 
realistic scenario is that each of the wiki servers by default mirrors only changes 
to the main public wiki branches/clones, and the private user branches are kept 
non-mirrored. This means that the potentially costly verification operation is 
not duplicated on the mirror(s) for local developments, and is done only when 
an important public change is made. 

8 Conclusion and Further Issues 

We have outlined a number of steps for building on our first version of a formal 
mathematics wiki. Our aims naturally require us to make use of several disparate 
technologies, including cutting-edge ones such as smart filesystems that can cope 
with very large scale datasets. 

The ultimate aim of making formal mathematics more attractive and man- 
ageable to the everyday mathematician remains. Extending our idea of "research 
notebooks" , we would eventually like to equip our wiki with an editor with which 
one's mathematical work could be carried out entirely on the web. Collabora- 
tive tools such as etherpad^^ are a natural target as well. Hooks into attractive, 
useful presentations of formal proofs such as Mamane's tmEgg and Tankink's 
Proviola [8] systems can help, and merging in powerful automation tools such 
as the MizAR system [11] is another obvious next step. 

At the moment, our wiki supports only Mizar and Coq. These are but two 
of the actively used systems for formalized mathematics; adding Isabelle and 
possibly HOL light are now within reach thanks to our experience with Mizar 
and Coq. Concerning Coq, we would like to take advantage of the ongoing Math 
Components project. 

Finally, we note that mappings between formal mathematics and the vast 
world of "informal" mathematics remains rather weak. Indeed, even links be- 
tween formal repositories is rather underdeveloped. Linking formal mathematical 
texts to some informal counterparts, such as to Wikipedia, PlanetMath^^ , Wol- 
fram MathWorld^'*, remains to be carried out. For Mizar, this has been achieved 
to some extent (providing Wikipedia-based mapping for about two hundred 

^^ http://etherpad.org/ 
http : //planetmath . org/ 
http : //mathworld . wolfram . com 



MML objects), but much remains to be done. It seems especially attractive, 
in the context of our wiki work, to build a well-connected corner of the World 
Wide Web linking formal and informal mathematics. 
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